Data Analysis and Machine Learning with Kaggle by Konrad Banachewicz & Luca Massaron

Data Analysis and Machine Learning with Kaggle by Konrad Banachewicz & Luca Massaron

Author:Konrad Banachewicz & Luca Massaron [Banachewicz, Konrad & Massaron, Luca]
Language: eng
Format: epub
Tags: Kaggle for beginners, kaggle competition, kaggle exercise, kaggle cheat sheet, data analytics, ml, machine learning, ai, artificial intelligence, python data analysis, data analysis competitions
Publisher: Packt Publishing
Published: 2021-10-30T00:00:00+00:00


Computational resources

Some competitions do pose limitations in order to render available to production feasible solutions, for instance the Bosh Production Line Performance competition - https://www.kaggle.com/c/bosch-production-line-performance - had strict limits on execution time, model file output and memory limit for your solution. Also Kernel based competitions, when requiring both training and inference to be executed on Kernels, do not pose a problem for the resources you have to use because Kaggle will provide with all the resources you need (and this is also intended as a way to put all participants on the same line for a better competition result).

Problems for you arise when you have kernel competitions just limited to inference time and therefore you can train your models on your own machine and the only limit is then based at test time on the number and complexity of models you produce. Since at moment most competitions require deep learning solutions, you have to consider that you surely need specialized hardware such as GPUs in order to achieve some interesting result in a competition. Anyway, also if you participate in some of the now rare tabular competitions, you’ll soon realize that you need a strong machine with quite a number of processors and memory in order to easily apply feature engineering to data, run experiments and build models quickly.

Standards do change rapidly, therefore it is difficult to mention a standard hardware that you should have in order to compete at least on the same league with others. We can anyway take a hint at such standard by looking at what other competitors are using, as their own machine or as a machine on the cloud.

For instance, recently HP has launched a program where it awarded a HP Z4 or Z8 to a few selected Kaggle participants in exchange with visibility for its brand. For instance, a Z8 machine has 56 cores, 3TB of memory, 48TB of storage (a good share by solid storage hard drives) and a NVIDIA RTX as GPU. We understand that such could be a bit out of reach for many as well as even also renting a similar machine for a short time on a cloud instance such as Google’s GCP or Amazon’s AWS is out of discussion for the consequent expenses for even a moderate usage.

Our suggestion, unless your ambition is to climb to the top rankings of Kaggle participants is therefore to go with the machines provided free by Kaggle, the Kaggle Notebooks (also previously known as the Kaggle Kernels).

Kaggle Notebooks are a versioned computational environment, based on Docker containers running in cloud machines, which allow you to write and execute both scripts and notebooks in R and Python languages. The Kaggle Notebooks are integrated into the Kaggle environment (you can make submissions from them and keep track what submission refers to what Notebook), they come with most data science packages pre-installed, and they allow some customization (you can download files and install further packages). The basic Kaggle Notebook is just CPU based, but



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.